New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dra scheduler: ensure that we never have nil claim/class parameters #123828
Conversation
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @pohly |
I'm realizing this is not sufficient. I think we need a fully populated "default" claim parameters object that requests a single resource with no configuration parameters. |
e976d81
to
917fa21
Compare
603f049
to
e69c531
Compare
I have currently hard-coded a "default" claim parameters object in the scheduler code. This won't work going forward as we add new structured models though. What we probably want is a field in the |
I wasn't sure whether we want that. The E2E tests use
... because without claim parameters, the scheduler doesn't know the claim is for a resource using model "named resources" or some future "some-other-resources". Makes sense. As a stop-gap solution this is okay. /lgtm For approval. |
LGTM label has been added. Git tree hash: da955e7c2a17de521426b6f9513850323e009ed1
|
e69c531
to
d067793
Compare
I'm looking into implementing a custom driver and noticed this issue. I like the defaulting idea, if you folks are fine with it, I can take care of it. |
I reversed the order so that the other one can merge without further approval. It just needs a new LGTM (and to be added to the milestone of course). |
@ravisantoshgudimetla: let's discuss in #123858. |
Previously we were returning the error string from 'err' (which is nil), when we should have been returning it from result.Error. Without this it is hard to debug issues with NodeUnprepareResources. Signed-off-by: Kevin Klues <kklues@nvidia.com>
for _, f := range classParameters.Filters { | ||
if f.DriverName == driverName && f.ResourceFilterModel.NamedResources != nil { | ||
filter = f.ResourceFilterModel.NamedResources | ||
break | ||
} | ||
} | ||
controller, err := namedresourcesmodel.NewClaimController(filter, perDriver.requests) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was the existing behavior equivalent in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks the same to me. classParameters
is now guaranteed to not be nil (a case that I had considered, in contrast to nil claim parameters), so the if check is not needed anymore. If it had been nil before, it's now empty, so the for loop doesn't do anything, just as before.
d067793
to
538b7ab
Compare
Without this, the scheduler was crashing in newClaimController() in pkg/scheduler/framework/plugins/dynamicresources/structuredparameters.go The code in newClaimController() assumes that the parameters are not nil. Furthermore it assumes that there is at least one DriverRequest populated in order to allocate any resources to a claim. This PR adds logic to define default claim/class parameters that will allow allocation to proceed even if an end user doesn't provide any class or claim parameters themselves. Signed-off-by: Kevin Klues <kklues@nvidia.com>
// match the default to how the resources of this | ||
// class are being advertized in a ResourceSlice. | ||
NamedResources: &resourcev1alpha2.NamedResourcesRequest{ | ||
Selector: "true", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that we would have to parse this selector. I would imagine that a case of "no parameters" is fairly common, so we should optimize for by understanding nil
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the scheduler doesn't inherently know which resource request model a driver is using to advertise its resources. We only know this by parsing this object and seeing which field is set (in this case NamedResources
).
As such, I think a better approach would be to allow a cluster admin to define a default ResourceClaimParameters object and link it to a resource class. However, this is a larger chunk of work (and we are past the code freeze), so I proposed adding this PR as a stop-gap for 1.30.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the scheduler doesn't inherently know which resource request model a driver is using.
Does it matter? The point is that it passes the "filter", regardless of the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll let @pohly comment on that as I'm not intimately familiar with all of the code in this plugin. All I know is some default should be set and the default should allow sharing. Whether that's through an explicit object or a branch off of detecting nil
I'll leave up to Patrick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the approach in this PR better because the rest of the code currently expects to determine the model and thus the resources based on the parameters. Making it handle nil adds more special cases.
There's also a clear path towards what we really want (configurable default parameters). We need those because detecting the model may not be enough, we also need to figure out what the filters for a claim with no parameters should be. This depends on the driver and/or cluster, not the model.
#123858 is tracking this next step.
538b7ab
to
21a0dd1
Compare
/lgtm |
LGTM label has been added. Git tree hash: a55bff3202752fa89898e36a5714630b01d5ba17
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, klueska The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I am adding an E2E test for this. Let's prioritize adding the fix first. |
/cc @kubernetes/release-team Another bug fix for a new feature in 1.30, please add the milestone. |
@pohly: GitHub didn't allow me to request PR reviews from the following users: kubernetes/release-team. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/milestone v1.30 |
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR adds logic to define default claim/class parameters that will allow
claim allocation to proceed even if an end user doesn't provide claim
parameters themselves.
Without this change, the scheduler was crashing in newClaimController() in
pkg/scheduler/framework/plugins/dynamicresources/structuredparameters.go
The code in newClaimController() assumes that claim parameters are not nil.
Furthermore it assumes that there is at least one DriverRequest populated in
order to allocate any resources to a claim.
However, we shouldn't force a user to supply a ResourceClaimParameters
object if they don't want to. We also shouldn't force a driver implementor to
write a controller just to associate a static ResourceClaimParameters object
with every claim.
In this PR, the "default" claim parameters objects are hard-coded in the
scheduler code. This won't work going forward as we add new structured
models though. What we probably want is a field in the ResourceClass which
points to a default ResourceClaimPrameters object that should be used if one
is not supplied as part of the claim.